Leveraging a national biorepository in Zambia to assess measles and rubella immunity gaps across age and space

High-quality, representative serological surveys allow direct estimates of immunity profiles to inform vaccination strategies but can be costly and logistically challenging. Leveraging residual serum samples is one way to increase their feasibility. We subsampled 9854 residual sera from a 2016 national HIV survey in Zambia and tested these specimens for anti-measles and anti-rubella virus IgG antibodies using indirect enzyme immunoassays. We demonstrate innovative methods for sampling residual sera and analyzing seroprevalence data, as well as the value of seroprevalence estimates to understand and control measles and rubella. National measles and rubella seroprevalence for individuals younger than 50 years was 82.8% (95% CI 81.6, 83.9%) and 74.9% (95% CI 73.7, 76.0%), respectively. Despite a successful childhood vaccination program, measles immunity gaps persisted across age groups and districts, indicating the need for additional activities to complement routine immunization. Prior to vaccine introduction, we estimated a rubella burden of 96 congenital rubella syndrome cases per 100,000 live births. Residual samples from large-scale surveys can reduce the cost and challenges of conducting serosurveys, and multiple pathogens can be tested. Procedures to access quality specimens, ensure ethical approvals, and link sociodemographic data can improve the timeliness and value of results.

To ensure sufficient sample size for age-specific seroprevalence estimates, participants younger than 5, 5-9, and 10-14 years of age were selected to each represent 22% of the subsample in each province, and participants 15-19 and 20-49 years were selected to each represent 17% of the subsample in each province (Fig. SM1). At least one respondent from each cluster was included. Not all selected specimens were able to be tested due to non-availability (14%) or lack of consent for future testing (<1%).  Figure SM1. Flow diagram for the selection of specimens from biorepository into subsample of specimens that were tested for measles and rubella antibodies. Weighting We calculated a weight for each specimen to be used in survey estimation where the confidence intervals were calculated using linearized Taylor series variance estimation. The weight was calculated based on probability of selection, nonresponse, trimming, and post-stratification.

Probability of selection
Weights from the ZAMPHIA project for all 11,500 PIRMZ samples selected were post-stratified to be nationally representative by scaling them to match 2016 population projected estimates from the Zambia Central Statistical Office in each stratum defined by province and age group. 1 These representative weights were inverted to calculate the cumulative probabilities of selection for each stage of the ZAMPHIA project. Those cumulative probabilities were multiplied by the probability of PIRMZ selection from the ZAMPHIA dataset. The updated product represents the cumulative probability of selection of all stages of ZAMPHIA and selection for PIRMZ. The updated cumulative probabilities were inverted once again to construct PIRMZ base weights.

Nonresponse
We adjusted for nonavailability of specimens through a nonresponse adjustment. To maintain the representativeness of the PIRMZ sample, the sampling weights were adjusted to account for blood nonavailability and for non-consent for further testing. Logistic regression indicated that availability of specimens differed to a statistically significant degree across adjustment cells defined by province and age group but not by sex. Adjustment cells were defined by province and age group, and within each cell, the sum of weights from all non-respondents (those without specimens available for testing) were shifted to respondents with specimens available. The weights of available specimens were adjusted upward and the weights of non-available specimens were set to zero. After this adjustment, the sum of weights in each adjustment cell was equal to the sum of weights before the adjustment, but after the adjustment, all the weight is carried by available specimens that were successfully tested in the PIRMZ study.

Trimming weights
To be consistent with the weighting procedures from ZAMPHIA, weights that were larger than 3.5 times the median weight value in strata defined by province, age group, and sex were trimmed back and set equal to 3.5 times the median. A total of 47 respondents had outlier weights that were trimmed; all other weights remained the same.

Post-stratification
Finally, the trimmed weights were post-stratified so the sum of weights for specimens with PIRMZ testing results in each stratum defined by province, sex, and age group would equal the population age structure projections for 2016 provided by Zambia's Central Statistical Office, which were based on the 2010 census. 1 The original qualitative interpretation for the measles IgG EIA kits were revised by the manufacturer for population-level seroprevalence studies like ours. Whereas the original values of ≥275 mIU/ml were classified positive, values of ≥200 to <275 were classified equivocal, and values of <200 were classified negative; we used the revised thresholds of ≥200 mIU/ml were classified positive, values of ≥150 to <200 were classified equivocal, and values of <150 were classified negative

Description of laboratory procedures
Measles and rubella samples above the top calibrator 1 were set at the upper limit of detection (5,000 mIU/ml and 200 IU/ml respectively). Similarly, samples below the lower limit of detection were set to the lower limit of detection (8 mIU/ml and 0.3 IU/ml respectively).

Quality control and quality assurance
We monitored laboratory results in near time, analyzing uploaded data daily. This included checking for plate validity, intra-plate concordance and inter-plate concordance. We also monitored inter-technician variability and inter-laboratory variability. The Euroimmune IgG ELISA kit contains positive and negative controls that serve as internal controls for the reliability of test procedures. We added 2 additional internal controls (one high positive and one equivocal/negative for measles and rubella) to every plate to monitor reliability across all the plates. Invalid plates, as defined by the manufacturer were rerun.
We systematically retested specimens to assess intra-and inter-plate variability and reproducibility. Eight specimens on each plate were predefined to be run in duplicate for intraplate variability. Equivocal results were re-tested and, classified based on the positive or negative value on retesting. If equivocal on retest, they remained classified as such. Equivocal results were categorized as positive for binary analyses. If there was discordance between positive and negative results upon retest, specimens were retested a third time and classified based on the two concurrent results.

Performance of DBS compared to plasma
Because only DBS were available for participants under 2 years of age, we validated that the DBS provided consistent results with plasma. We selected 100 plasma specimens that had paired DBS available, because they were prepared as part of HIV testing procedures for ZamPHIA, and tested both the plasma and DBS to compare.
We tested 3 different protocols to optimize the protocol for DBS elution. 4 The final protocol involved the circumference of each DBS being measured, punched with sterilized 6mm hole punch, and serum being eluted with 450 µL of buffer, adapted from the manufacturer's recommendation. 5 One hundred microliters of eluted sample were transferred to precoated 96-well plates and the manufacturer's protocol was followed to perform the EIA, as was done with plasma.
The results demonstrated consistent readings across both the DBS and plasma for measles and rubella IgG ( These results suggested no adjustments were needed for the DBS tested. There were similar findings in India, where sensitivity of antibody detection by DBS was greater than 98%, and specificity was 90% and 98%, for measles and rubella IgG, respectively. 6 The nested serosurvey was sampled to represent provinces by age-group, therefore to characterize agespecific curves by province we took into account the hierarchical structure of the data. We fit hierarchical generalized additive models to individual measles and rubella seropositivity by age. The evaluated models accounted for the data structure by including a single national level smoother over age applied to all provinces. 7 where g is a link function, f is a smoothing function over age, and fprovince is a smoother for age for a given province. Specimens from individuals younger than 9 months of age with potential maternally-derived antibodies were excluded to improve model fit. The R package mgcv was used to fit the hierarchical generalized additive models given its computational efficiency, automated selection of the smoothness parameter, and goodness of fit to the data. 8 The models assumed a binomial probability distribution for seropositivity and included isotropic smoothers on age. We compared models with different smoothing bases (cubic regression splines and thin plate regression splines), and with different link functions (complementary log-log and log odds). The choice of the basis dimension in each model was large enough to have sufficient degrees of freedom to represent the underlying data. 8 Final models for measles and rubella were selected by minimizing the Akaike Information Criterion (AIC).

Using Indirect Methods to Reconstruct Measles National Age-Specific Immunity Profiles
In addition to the directly estimated national age-specific profile of measles immunity using serological data described above, we also indirectly reconstructed national age-profile of measles immunity. The indirect analysis, modified methods from Takahashi et al. 2015, 9 estimates the proportion of each birth cohort that is immune based on probabilities of vaccination (routine and campaigns) taking into account vaccine effectiveness, risk of natural infection, and probability of maternally derived immunity. For example, if 80% of the birth cohort was routinely vaccinated and the cumulative measles attack rate was 75% among those unvaccinated, then 95% of the cohort would be estimated as immune and 5% would be susceptible. This is illustrative, and a simplification as it assumes 100% vaccine effectiveness by age, a time constant cumulative force of infection, and no opportunity for a vaccination campaign.
We assumed that routine vaccination coverage rates were equivalent to the World Health Organization and United National Children's Fund (WUENIC) estimates for routine MCV1 and MCV2 coverage rates for Zambia 1983-2016 and 2014-2016, respectively. 10 We assumed vaccine effectiveness for MCV1 and MCV2 was 85% and 95%, respectively. 11 We assumed dependence between routine doses such that the probability of being successfully routinely vaccinated within a birth cohort was where VE stands for vaccine effectiveness.
Vaccination campaign timing, age ranges, and coverage rates were extracted from WHO reported administrative estimates. 12 Zambia has conducted four national measles vaccination campaigns (Table  SM2), we assumed maximum coverage of a campaign was 99%. We assumed campaign vaccine effectiveness of 95% among birth cohorts who were at least 12 months at the time of the campaign. If the campaign minimum target age was 6 or 9 months old we assumed 74% and 89% vaccine effectiveness among birth cohorts who were 6-11 or 9-11 months at the time of the campaign, respectively. 11 We assumed that if a birth cohort had experienced vaccination via a campaign as well as routine vaccination, the probability of successful vaccination in that cohort was taken as the higher of the two values: the probability of being successfully routinely vaccinated or the probability of being successfully vaccinated in a campaign. This simplification is similar to assuming 100% correlation between routine and campaign vaccination, however it does not allow for the probability of successful vaccination following a primary vaccine failure. Given the very high rates of campaign coverage, the results are robust to this analytic simplification. The proportion of infants with maternally derived immunity was estimated as the mean of exp (−0.45 ), where a is age in months from 1 to 12. 13 This is equal to 14.6% of the infant population with maternally derived immunity.

Year
The probability of immunity from natural infection for each birth cohort was estimated using the catalytic model taking into account the annual forces of infection each birth cohort was exposed to over their lifetime. The probability of immunity for each age a is defined as, Using Indirect Methods to Reconstruct Provincial Measles Immunity Profiles in Individuals 10 months to 4 years old To reconstruct provincial age-specific immunity profiles we estimated the proportion of each birth cohort that is immune based on routine vaccination only, taking into account vaccine effectiveness. We focused on birth cohorts born 2013 to 2016 who would not have been eligible for the 2012 vaccination campaign and assumed they had no risk of natural infection over this time period given the small number of measles cases reported in Zambia since 2016 (an average of 11 annual reported cases between 2016 and 2019). 15 We relied on Zambian administrative measles vaccination data (i.e., the reported number of measles vaccines for each dose delivered each year) by province as the numerator by which to estimate vaccination coverage per birth cohort. We relied on population projections using a cohort component model calculated by the Zambian Statistical Office for total and age-specific provincial population sizes to estimate the denominator (i.e., the number eligible for each vaccine dose). 1 The "traditional" method used by the Zambian EPI program, assumes that the population eligible for MCV1 vaccination is 4% of the total provincial population size assuming a homogeneous birth rate of close to 40/1000 population and ignoring mortality before the age for routine MCV1, and the population eligible for MCV2 vaccination is 8% of the total provincial population size. The "revised" method assumes that the population eligible for MCV1 vaccination is the number of individuals listed as age 0 in each province (which takes into account province-specific births and infant mortality rates), and the population eligible for MCV2 vaccination is the number of individuals listed as age 1 in each province. We assumed routine doses were dependent (treating MCV2 as a true second dose) and that vaccine effectiveness for MCV1 and MCV2 was 85% and 95%, respectively. 11 We are estimating immunity for individuals over 9 months old, therefore do not consider maternally derived immunity.

Measles District Age-Specific Seroprevalence
To estimate district-specific measles seroprevalence, hierarchical spatial models were fit to individual measles seropositivity. District-specific random effects were included in the model based on a conditional autoregressive (CAR) specification in which districts adjacent to one another were assumed to be more similar than districts not adjacent. The models assumed a binomial probability distribution for seropositivity and a log odds link.
We explored epidemiological and demographic model covariates. Covariates of HIV positivity, age, district, and province of residence were extracted from the ZAMPHIA questionnaire and linked directly to the serum samples. We explored demographic and measles epidemiological covariates that were linked to serum samples based on the age and district of residence of the individual sampled. These included covariates from Zambia's EPI program such as district and year-specific MCV1 and MCV2 administrative coverage or eligibility, district and year-specific outbreak risk defined as at least one, two, or three annual measles-specific IgM positive cases, and campaign coverage eligibility and provincespecific 2012 vaccination campaign coverage (we did not have access to province-specific campaign coverage for campaigns prior to 2012). We also evaluated district specific population density, district and age-specific MCV1 vaccination coverage from Zambia's 2018 Demographic and Health Survey (DHS) that we estimated using geospatial modeling techniques, 9 and district-specific MCV1 vaccination coverage estimates per Institute for Health Metrics and Evaluation Global Health Data Exchange. 16 The final model, described below, was selected by minimizing the Widely Applicable Information Criterion (WAIC).
Indexes i, j, and k represent the individual, district, and province from which the serum sample was collected. HIVpos is a binary variable based on HIV positivity, age is age in years, mcv1 is the MCV1 vaccination coverage estimated from DHS data, under4 is a binary variable classifying individuals under four years of age, mcv2eligible is a binary variable indicating whether the individual was eligible for MCV2 (introduced in 2012), mcv2 is MCV2 vaccination coverage estimated from administrative data, SIAeligible is a binary variable classifying an individual as eligible for the 2012 measles vaccination campaign that targeted individuals 6 months through 14 years old, Outbreak is a binary variable indicating whether an individual was exposed to a district-specific measles outbreak since 2012. We allowed provincial specific effects of the 2012 vaccination campaign eligibility. We defined an outbreak within a district as two or more measles-specific IgM positive cases reported within a year based on data collected by Zambia EPI program. We also included interactions between HIV serostatus and age, as well as HIV serostatus and squared age.
To estimate γ j we assumed that it is multivariate normally distributed with a location-specific mean of , = 0 and a spatially-independent standard deviation of conditional autoregressive model (CAR) specification for the spatial random effects, parametrized by the precision matrix, 1 2 ⁄ : Above, we have the precision matrix 1 2 ⁄ , is a precision parameter, controls the spatial dependence ( =0 implies spatial independence, and =1 collapses to an intrinsic conditional autoregressive model), D is an j by j diagonal matrix with diagonal elements encoding the number of adjacent neighbors that each district has, W is a binary adjacency matrix. This specification means that estimated seroprevalence at any given district is conditional on the estimated seroprevalence of neighboring districts.
The model was fit using Stan in R using two chains with 5000 iterations per chain (half of all iterations were burn-in) until model convergence was achieved. 17 The posterior distribution of alpha collapsed to 1, resulting in an intrinsic conditional autoregressive model (Fig. SM3). Figure SM4 displays model estimates of gamma (district-level intercepts), where we find intra and inter-district variation. The partial pooling or regularization effect of the hierarchical spatial model is displayed in Figure SM5; districts with low seroprevalence are pulled upwards and districts with high seroprevalence are pulled downward, shrinking all estimates towards the mean. Figure S6 displays estimates for the beta parameters. HIV positive individuals had significantly lower seroprevalence (beta0), and among individuals less than four years old we find individuals exposed to higher vaccination coverage rates within the respective district and birth cohort had significantly higher seroprevalence (beta5). The impact of being eligible for the most recent SIA on seroprevalence differed by province (beta7); Northern and Western provinces had a larger positive impact of the SIA than other provinces in the country.
We conducted leave-one-out cross validation analyses to evaluate the performance of the model to estimate district and age-specific seroprevalence. In these analyses we left out each district or age in years, retrained the model with the smaller dataset, and then used the new posterior estimates of the parameter values to predict seroprevalence for the district or age originally left out (Fig. SM7-SM8). The model does well to predict seroprevalence for missing ages, but not to predict seroprevalence for missing districts. This finding is expected, given our reliance on the model district-specific random effects (gamma parameter) that captures variation not explained by our demographic and epidemiologic covariates. We find that extrapolating the gamma parameter for a missing district does a poor job to capture observed district seroprevalence. Fortunately, we do not extrapolate our model to predict seroprevalence any new districts, and only extrapolate to missing ages within a district. As a result, the model is suitable to answer the question at hand (i.e., district and age-specific seroprevalence for the 72 districts represented in the data).
The final step to estimate district-specific seroprevalence is to weight the seroprevalence estimates by district-specific population characteristics. We created a new dataset with all possible covariate groupings. For example, one possible covariate grouping is district Chadiza, HIV negative, 7 years old which is associated with not being under four years old, not eligible for MCV2, eligible for the 2012 SIA in Eastern Province, and exposure to a local outbreak. We estimated the probability of seropositivity for each covariate grouping across 2500 samples from parameter posterior distribution sets, taking into account both uncertainty of the mean parameter values and uncertainty in the sampling process. We then weighted the probability of seropositivity by each covariate grouping and sampled parameter set per district by the proportion of individuals in that covariate grouping to get 2500 estimates of seroprevalence for each district. Figure 1C in the main text shows the final results.      Figure SM8. Results of leave out age analysis. Each point represents a different age in years. Left figure displays the estimated mean seroprevalence for a predicted ages left out of the analysis (y-axis) by the expected seroprevalence given the age was included in the analysis (x-axis). Right figure displays the estimated mean seroprevalence for a predicted ages left out of the analysis (y-axis) by the observed seroprevalence for the respective age (x-axis). Dashed blue line is fit line and red solid line represents perfect agreement.

Measles Outbreak Risk 2016-2019
We evaluated Zambia's national measles outbreak risk 2016 to 2019 by estimating measles effective reproduction number (Reff) 2016 to 2019. Reff is the average number of secondary cases per infectious individual. If Reff is over one, cases can increase and there is risk of an outbreak. If Reff is less than one, the number of cases will decline and transmission will eventually cease. We estimate Reff as the dominant eigen value of the next generation matrix, . 18 The next generation matrix is defined as where = (s 1 , s 2 , s 3 , … s n ) as a vector of number susceptible individuals in n age groups, N is the total size of the Zambian population under 50 years old, and is the who acquires infection from whom (WAIFW) matrix that is n x n dimensions. The WAIFW matrix was calculated by scaling an inferred agecontact matrix for Zambia derived by Prem et al. 2017 19 , such that the dominant eigen value of the next generation matrix calculated from the inferred age-contacts and 2015 Zambian age structure was equal to the conservatively assumed measles basic reproduction number of 12 (although estimates of the basic reproduction number vary substantially 20 ). We assumed was constant between 2016 and 2019; however we estimated N and each year 2016 to 2019. The total size of Zambia's population under 50 years old from 2016 to 2019 was estimated from United Nations Population Projections for Zambia. 21 The number of susceptible individuals per age group in 2016 was estimated directly from the 2016 seroprevalence data and Zambia's estimated age structure per United Nations Population Projections. 21 The number of susceptible individuals per age group in 2017 to 2019 was indirectly estimated from the 2016 seroprevalence data.
Methods developed by Funk et al. 2019 relied on cross-sectional serological data to estimate future seroprevalence. 22 This innovative approach extends the utility of seroprevalence data beyond the year of collection, and was relied on to estimate measles national age-specific seroprevalence 2017 to 2019. We focused on immunity due to vaccination only, assuming immunity due to natural infection would have no meaningful impact at the population level given the low number of measles cases reported in Zambia since 2016 (an average of 11 annual reported cases between 2016 and 2019). 15 We assumed that immunity for new birth cohorts eligible for routine MCV1 was given by a routine vaccination scaling factor multiplied with the reported coverage in that year. We assumed that additional immunity from the 2016 campaign occurring after serological data collection was given by a vaccination campaign scaling factor multiplied with the reported campaign coverage. The scaling factors were estimated as the ratio of the observed seroprevalence for eligible birth cohort(s) and the level of reported coverage. For the 2016 campaign and second dose of measles, we assumed that the vaccine was preferentially given to those that had received the first dose of the vaccine. For the routine measles second dose we additionally took into account vaccine effectiveness. MCV1 and MCV2 coverage rates for Zambia were extracted from the World Health Organization and United National Children's Fund (WUENIC) estimates. 10 The 2012 and 2016 measles vaccination campaign coverage estimates were extracted from WHO survey estimates of 96% and 95%, respectively. 12 By relying on scaling factors for routine MCV1 and campaign, we are assuming i) the relationship between estimated coverage and seroprevalence remains constant over time and ii) that seropositivity is only a result of the routine or campaign vaccination. The result of assumption ii is likely to under-estimate the scaling factor and over-estimate the impact of vaccinations on seroprevalence; therefore reducing estimates of Reff. Additionally, the conservative assumption of scaling the WAIFW to a basic reproduction number of 12 rather than 15 or 20 or higher, also results in a lower estimated Reff. , where ( ) is age-profile of proportion seropositive estimated from the generalized additive model above, a is age in years and ranging from 0.83 (i.e., 10 months old to ignore passively acquired immunity) to age 49. We estimated the average age was infection of 8.59 (95% CI 6.5, 10.77).
The estimated CRS rate (CRS incident cases per 100,000 live births) for each reproductive age in years  and province was estimated by , = (1 − , ) × (1 − −16 , /52 ) × 0.65 × 100,000, where , is the estimated seroprevalence at age a and province p, , is the estimated force of infection at age a and province p. The force of infection was defined as , ′/ (1 − , ). 23 We assumed that 65% of infants born to women infected during the first 16 weeks of pregnancy would be born with CRS. 24 The provincial total CRS incidence per 100,000 live births was calculated as the mean of CRS rate in each reproductive age in years weighted by the number of births in each reproductive age in years. It is defined as where , is the annual number of births to women of age a and province p. The annual age and province specific number of births was estimated using Zambia's Central Statistical Office projected 2016 provincial total number of births and national age-specific fertility rates, 1 such that the mothers' age distribution of the total provincial number of births was determined by the national age specific fertility rate.

Chronology and operationalization of serosurvey
Using a pre-existing biorepository had its own challenges even if the complexities of specimen collection were circumvented. Additional policies and procedures required to access specimens collected by another organization had to be fulfilled. These included making amendments to the original protocol and obtaining ethical approvals from all the institutions that were involved in the original ZAMPHIA study. By the time of our study the procedures for accessing the biorepository were not yet in place and this added an extra layer of complication and required a lot of negotiating with all the ZAMPHIA stakeholders. Additionally, interpreting and accessing sociodemographic data from the survey was delayed due to unclear policies and coordination between organizations. The table below summarizes timing of study activities (Table SM3). These HIV Impact Assessments are being conducted in at least 15 low and middle income countries, so this process could be replicated. 25  Table S3: Measles and rubella survey weighted seroprevalence by sex. There is a significant interaction between sex and age (see Figure S2). significance levels: *** 0.001; ** 0.01; * 0.05  Table S4: Measles and rubella survey weighted seroprevalence by age group (years) and province. See figures S1 and S7 for histograms of these numbers. Figure S1: Measles survey weighted seroprevalence by age group and province Figure S2: Measles survey weighted seroprevalence by age group and sex. Males had significantly lower measles seroprevalence than females in all age groups younger than 20 years; there were no significant differences in seroprevalence by sex in the 20-49 year age group. significance levels: *** 0.001; ** 0.01; * 0.05; ns not significant  Figure S4: Estimated measles immunity by province in children 10 months to 4 years old. Indirect estimate of immunity based on MCV1 and MCV2 administrative (admin) coverage by year and province and vaccine efficacy (VE). The "revised" method (blue line) assumes that the population eligible for MCV1 vaccination is the number of individuals listed as age 0 in each province, and the population eligible for MCV2 vaccination is the number of individuals listed as age 1 in each province. The "traditional" method (black line) assumes that the population eligible for MCV1 vaccination is 4% of the total provincial population size, and the population eligible for MCV2 vaccination is 8% of the total provincial population size. Direct seroprevalence estimate as the survey weighted seroprevalence in the age group 10 months to 4 years (red line). Figure S5: Comparison of provincial rank of estimated measles immunity in children 10 months to 4 years old for between seroprevalence and two different indirect estimates of immunity: "Revised Method" (left) and "Traditional Method" (right) (for further detail see Supplemental Methods or Figure S4). "Seroprevalence" is the survey weighted seroprevalence in the age group 10 months to 4 years. Each point represents a different province (10 total provinces) ranked from lowest (1) to highest (10) immunity estimate.   The number of samples for each age in months from 0 to 9 months were fairly small including 7,21,20,26,19,23,33,27,29, and 41 samples, respectively. Little inference can be made from these small samples, but there is a generally decreasing trend in measles seroprevalence between 0 and 5 or 6 months of age for both pathogens.